Corpus analysis of simultaneous interpretation data for improving real time speech translation
نویسندگان
چکیده
Real-time speech-to-speech (S2S) translation of lectures and speeches require simultaneous translation with low latency to continually engage the listeners. However, simultaneous speech-to-speech translation systems have been predominantly repurposing translation models that are typically trained for consecutive translation without a motivated attempt to model incrementality. Furthermore, the notion of translation is simplified to translation plus simultaneity. In contrast, human interpreters are able to perform simultaneous interpretation by generating target speech incrementally with very low ear-voice span by using a variety of strategies such as compression (paraphrasing), incremental comprehension, and anticipation through discourse inference and expectation of discourse redundancies. Exploiting and modeling such phenomena can potentially improve automatic real-time translation of speech. As a first step, in this work we identify and present a systematic analysis of phenomena used by human interpreters to perform simultaneous interpretation and elucidate how it can be exploited in a conventional simultaneous translation framework. We perform our study on a corpus of simultaneous interpretation of Parliamentary speeches in English and Spanish. Specifically, we present an empirical analysis of factors such as time constraint, redundancy and inference as evidenced in the simultaneous interpretation corpus.
منابع مشابه
Constructing a Speech Translation System using Simultaneous Interpretation Data
There has been a fair amount of work on automatic speech translation systems that translate in real-time, serving as a computerized version of a simultaneous interpreter. It has been noticed in the field of translation studies that simultaneous interpreters perform a number of tricks to make the content easier to understand in real-time, including dividing their translations into small chunks, ...
متن کاملCollection of a Simultaneous Translation Corpus for Comparative Analysis
This paper describes the collection of an English-Japanese/Japanese-English simultaneous interpretation corpus. There are two main features of the corpus. The first is that professional simultaneous interpreters with different amounts of experience cooperated with the collection. By comparing data from simultaneous interpretation of each interpreter, it is possible to compare better interpretat...
متن کاملConstruction of Chunk-Aligned Bilingual Lecture Corpus for Simultaneous Machine Translation
Abstract With the development of speech and language processing, speech translation systems have been developed. These studies target spoken dialogues, and employ consecutive interpretation, which uses a sentence as the translation unit. On the other hand, there exist a few researches about simultaneous interpreting, and recently, the language resources for promoting simultaneous interpreting r...
متن کاملConstruction and utilization of bilingual speech corpus for simultaneous machine interpretation research
This paper describes the design, analysis and utilization of a simultaneous interpretation corpus. The corpus has been constructed at the Center for Integrated Acoustic Information Research (CIAIR) of Nagoya University in order to promote the realization of the multi-lingual communication supporting environment. The size of transcribed data is about 1 million words, and the corpus would deserve...
متن کاملInterpreting Unit Segmentation of Conversational Speech in Simultaneous Interpretation Corpus
The speech-to-speech translation system is becoming an important research topic with the progress of the speech and language processing technology. Considering efficiency and the smoothness of the cross-lingual conversation, the simultaneity of the translation processing has a great influence on the performance of the system. This paper describes interpreting unit segmentation of conversational...
متن کامل